Domain adaptation methods in the IBM trainable text-to-speech system
نویسندگان
چکیده
This paper presents a comparison of domain adaptation techniques for a unit selection based text-to-speech system. The methods under investigation consider two different prerequisites, namely the absence and the existence of additional domain specific training prompts, spoken by the original voice talent. Whereas in the first case we employ domain specific pre-selection, for the latter we compare a variety of methods that range from a simple extension of the segment inventory to a complete reconstruction of the system, which also includes the training of decision trees for the domain dependent prediction of prosody targets. An experimental evaluation of the methods under consideration unveils significant improvements (up to 1.1 on a 5 point MOS scale) over the baseline system for sentences from the target domain, while showing no significant degradation when synthesizing sentences from other than the adaptation domain.
منابع مشابه
Current status of the IBM Trainable Speech Synthesis System
This paper describes the current status of the IBM Trainable Speech Synthesis System. The system is a state-of-the-art, trainable, unit-selection based concatenative speech synthesiser. The system uses hidden Markov models (HMMs) to provide a phonetic transcription and HMM state alignment of a database of single-speaker continuous-speech training data. The runtime synthesiser uses the HMM state...
متن کاملPointwise Prediction and Sequence-Based Reranking for Adaptable Part-of-Speech Tagging
This paper proposes an accurate method for partof-speech (POS) tagging that is highly domain-adaptable. The method is based on an assumption that the POS transition tendencies do not depend on domains, and has the following three characteristics: 1) it is trainable from partially annotated data, 2) it uses efficiently trainable pointwise POS taggers to allow for active learning, and 3) is more ...
متن کاملReducing the footprint of the IBM trainable speech synthesis system
This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech gen...
متن کاملPhrase splicing and variable substitution using the IBM trainable speech synthesis system
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speechproduction lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone se...
متن کاملUnsupervised Vocabulary Adaptation for Morph-based Language Models
Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity na...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004